computing resource
Elon Musk is making a big bet on his future vision – will it work?
Elon Musk is making a big bet on his future vision - will it work? Reports suggest that Elon Musk is eyeing up a merger involving SpaceX, Tesla and xAI, but what does he hope to achieve by consolidating his business empire? Elon Musk is a busy man, heading up multiple billion-dollar companies. While he is increasingly a divisive figure, there is no doubt that Tesla and SpaceX, his two most important ventures, have done much to advance the future of electric cars and spacecraft, respectively. But a series of corporate moves this week suggests Musk has a new vision of the future - and he may be combining all his companies to get there.
- North America > United States > Tennessee > Shelby County > Memphis (0.05)
- North America > United States > Michigan (0.05)
- Europe > Switzerland (0.05)
- Asia > China (0.05)
- Health & Medicine > Therapeutic Area (0.50)
- Government (0.50)
D-LLM: AT oken Adaptive Computing Resource Allocation Strategy for Large Language Models
Large language models have shown an impressive societal impact owing to their excellent understanding and logical reasoning skills. However, such strong ability relies on a huge amount of computing resources, which makes it difficult to deploy LLMs on computing resource-constrained platforms. Currently, LLMs process each token equivalently, but we argue that not every word is equally important. Some words should not be allocated excessive computing resources, particularly for dispensable terms in simple questions. In this paper, we propose a novel dynamic inference paradigm for LLMs, namely D-LLMs, which adaptively allocate computing resources in token processing. We design a dynamic decision module for each transformer layer that decides whether a network unit should be executed or skipped. Moreover, we tackle the issue of adapting D-LLMs to real-world applications, specifically concerning the missing KV -cache when layers are skipped. To overcome this, we propose a simple yet effective eviction policy to exclude the skipped layers from subsequent attention calculations. The eviction policy not only enables D-LLMs to be compatible with prevalent applications but also reduces considerable storage resources.
Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval
Vision and diverse languages are important information sources in our living world. A model that understands multi-modalities and multi-languages can be applied to a wider range of real-life scenarios. To build such a multimodal and multilingual model, existing works try to ensemble vision-language data from multiple languages in pre-training. However, due to the large number of languages, these works often require huge computing resources and cannot be flexibly extended to new languages. In this work, we propose a MultiLingual Acquisition (MLA) framework that can easily empower a monolingual Vision-Language Pre-training (VLP) model with multilingual capability. Specifically, we design a lightweight language acquisition encoder based on state-of-the-art monolingual VLP models. We further propose a two-stage training strategy to optimize the language acquisition encoder, namely the Native Language Transfer stage and the Language Exposure stage. With much less multilingual training data and computing resources, our model achieves state-of-the-art performance on multilingual image-text and video-text retrieval benchmarks.
D-LLM: A Token Adaptive Computing Resource Allocation Strategy for Large Language Models
Large language models have shown an impressive societal impact owing to their excellent understanding and logical reasoning skills. However, such strong ability relies on a huge amount of computing resources, which makes it difficult to deploy LLMs on computing resource-constrained platforms. Currently, LLMs process each token equivalently, but we argue that not every word is equally important. Some words should not be allocated excessive computing resources, particularly for dispensable terms in simple questions. In this paper, we propose a novel dynamic inference paradigm for LLMs, namely D-LLMs, which adaptively allocate computing resources in token processing. We design a dynamic decision module for each transformer layer that decides whether a network unit should be executed or skipped. Moreover, we tackle the issue of adapting D-LLMs to real-world applications, specifically concerning the missing KV-cache when layers are skipped. To overcome this, we propose a simple yet effective eviction policy to exclude the skipped layers from subsequent attention calculations. The eviction policy not only enables D-LLMs to be compatible with prevalent applications but also reduces considerable storage resources.
Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers
Doostmohammadian, Mohammadreza, Gabidullina, Zulfiya R., Rabiee, Hamid R.
In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine learning (ML) and optimization is considered in this paper. Given a set of data distributed over a network of computing-nodes/servers, the idea is to optimally assign the CPU (central processing unit) usage while simultaneously training each computing node locally via its own share of data. This formulates the problem as a co-optimization setup to (i) optimize the data processing and (ii) optimally allocate the computing resources. The information-sharing network among the nodes might be time-varying, but with balanced weights to ensure consensus-type convergence of the algorithm. The algorithm is all-time feasible, which implies that the computing resource-demand balance constraint holds at all iterations of the proposed solution. Moreover, the solution allows addressing possible log-scale quantization over the information-sharing channels to exchange log-quantized data. For some example applications, distributed support-vector-machine (SVM) and regression are considered as the ML training models. Results from perturbation theory, along with Lyapunov stability and eigen-spectrum analysis, are used to prove the convergence towards the optimal case. As compared to existing CPU scheduling solutions, the proposed algorithm improves the cost optimality gap by more than $50\%$.
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (4 more...)
- Instructional Material (0.68)
- Research Report (0.64)
- Energy > Power Industry (0.93)
- Information Technology (0.88)
The Role of Computing Resources in Publishing Foundation Model Research
Hao, Yuexing, Huang, Yue, Zhang, Haoran, Zhao, Chenyang, Liang, Zhenwen, Liang, Paul Pu, Zhao, Yue, Sun, Lichao, Kalantari, Saleh, Zhang, Xiangliang, Ghassemi, Marzyeh
Artificial Intelligence (AI) and machine learning (ML) models have made stark advances in the past three years, fueled by the development of foundation models (FM) trained on large-scale multimodal data. Following the public release of several successful FMs (OpenAI (2022); Brown et al. (2020); Bommasani et al. (2022)), FMs such as large language models (LLMs) and vision language models (VLMs) have bridged vision, language, and other modalities. In many Computer Science subfields such as Natural Language Processing (NLP) and Computer Vision (CV), FMs have demonstrated strong compositional performance and generalization capabilities (Awais et al. (2025); Gunter et al. (2024)), emerging as widely-used tools (Bommasani et al. (2022)) that provide a flexible backbones for innovation in other fields (Moor et al. (2023); Sartor & Thompson (2025); Firoozi et al. (2024)). Conducting FM research requires significant data, computing, and human resources (Cottier et al. (2024); Maslej et al. (2024); Crawford (2024)). A central concern in the field is whether greater access to such resources directly translates into more impactful research outcomes (Acemoglu (2024); Dodge et al. (2019); OpenAI (2018)), such as more research publications, or higher citation counts (Sinclair et al. (2023); Anjum et al. (2019)). The answer to this question has important implications for how resources are allocated, which research directions are prioritized, and how equitable participation in FM research can be ensured. However, the cost of research is often difficult to quantify due to lack of uniform disclosure on resource distribution (Bommasani et al. (2024)). Absent widespread disclosure, funding is perhaps most easily characterized in the concrete cost of purchasing or renting hardware (e.g., computing clusters, or chips), through there are also software, cloud storage services, and specialized software platform costs.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (7 more...)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Overview (1.00)
- Information Technology (0.94)
- Social Sector (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.55)
- Information Technology (0.93)
- Education (0.67)
A Proofs
Consider binary classification and follow our notations, we rewrite the Equation 1 in Kobayashi et al. The last few lines follow from the definition of conditional probabilities. Proposition A.2. Assume that the loss function This claim immediately follows Lemma A.1, where we shows that In this section, we provide results for instance level feedback in the MIL setting. We then train it with a binary cross entropy. We bold the highest value and both if the standard-errors overlap.
06d5ae105ea1bea4d800bc96491876e9-AuthorFeedback.pdf
We thank all the reviewers for the constructive comments. We address the major concerns below. Reproducibility: 1) learning to draft details; 2) feature details; 3) discussions on the computing resources used. The search tree is updated based on four steps of MCTS. The learning rate is set to 0.001 with Adam.